Skip to content

Check the DBM comment region in place, dropping the duplicate-comment substring#11737

Draft
dougqh wants to merge 29 commits into
masterfrom
dougqh/dbcommenter-scan-overload
Draft

Check the DBM comment region in place, dropping the duplicate-comment substring#11737
dougqh wants to merge 29 commits into
masterfrom
dougqh/dbcommenter-scan-overload

Conversation

@dougqh

@dougqh dougqh commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

⛔ DO NOT REVIEW YET — blocked on #11734 and #10640

Draft placeholder. The diff is inflated: this stacks on two not-yet-merged PRs whose contents show up here —

Once both land on master and this is rebased, the diff reduces to this PR's real change. Please hold review until then.

What this actually does (post-rebase)

Drops the per-call sql.substring(...) that SQLCommenter.hasDDComment did just to scan a comment for trace-comment needles. Adds:

  • Strings.regionContains(s, from, to, needle) — an allocation-free, copy-free range check (the primitive).
  • SubSequence.contains(needle) — the natural view method, delegating to the primitive.
  • SharedDBCommenter.containsTraceComment(sql, from, to) — checks the comment body in place as SubSequence.of(sql, from, to).contains(...); the String overload (used by Mongo) delegates to it.

SQLCommenter then checks the comment region in place — no substring. Reads as a 1-to-1 substitution for the idiomatic String form.

Measured — SQLCommenterDuplicateCommentBenchmark (JDK 25, @Threads(8), @Fork(2), -prof gc)

throughput gc.alloc.rate.norm
before (substring) 15.4M ± 3.1M ops/s 140 B/op
after (range/view) 16.8M ± 1.3M ops/s ≈ 0

The allocation delta (140 → 0 B/op) is the win and is exact/fork-stable. Throughput is indicative only — @Fork(2) and the run wasn't on a quiet machine; this path is scan-CPU-dominated, so the win is the allocation, a small cut that compounds across comment-bearing injects. (Clean @Fork(5) re-run pending before ready.)

🤖 Generated with Claude Code

dougqh and others added 28 commits February 19, 2026 10:34
- fast replaceAll for a fixed string & replacement, 3x throughput compared to regex based solutions, 1/2x allocation compared to regex solutions

- added SubSequence which provides a view into a subsequence of a String without incurring extra allocation

- Strings.spliit returns an Iterable<SubSequence> can be used to do light weight processing of a String
These benchmarks exist to show why the APIs are forbidden
…tion)

Per bric3 review: the doc said what SubSequence avoids allocating but not why it
matters. Explain the use case — allocation-free lightweight parsing: substring/
subSequence copy per call, so splitting a string into many pieces on a hot path
allocates O(pieces) Strings; SubSequence is a zero-copy view. Also fixes a typo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SharedDBCommenter.containsTraceComment built nine "KEY + =" strings on every
call: the keys are encode()-derived (not compile-time constants), so each
"KEY + =" is a runtime concat. A non-matching comment ran all nine checks and
allocated nine throwaway Strings per call. Precompute the needles once at class
init as *_EQ constants; the contains-chain is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Throughput benchmark at @threads(8) over a realistic comment-content mix
(mostly non-DD, the all-nine-checks worst case) to surface the per-call
allocation removed by hoisting the "<key>=" needles to constants. Run the
parent commit (baseline) vs this branch and read the ops/s delta; -prof gc
(gc.alloc.rate.norm) corroborates the mechanism.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tent substring

SQLCommenter.hasDDComment materialized sql.substring(commentStart, commentEnd)
on every duplicate-comment check just to feed containsTraceComment. Add a
SharedDBCommenter.containsTraceComment(sql, from, to) range overload that scans
the comment body in place, and a Strings.regionContains primitive it (and the
String delegate) build on -- no per-call substring. The String overload now
delegates to the range form, so Mongo behavior is unchanged.

regionContains is the allocation-free, copy-free primitive; the natural-reading
SubSequence.contains layer can delegate to it later. Boundary semantics unit-
tested on Strings.regionContains; DB needle behavior on the overloads; existing
SQLCommenter/SharedDBCommenter suites unchanged and green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…o dougqh/dbcommenter-scan-overload

# Conflicts:
#	internal-api/src/main/java/datadog/trace/util/Strings.java
…diom)

Add SubSequence.contains (delegating to the Strings.regionContains primitive)
and rewrite containsTraceComment(sql, from, to) as
SubSequence.of(sql, from, to).contains(...) -- a 1-to-1 substitution for what
you'd idiomatically write on a substring, with no copy. EA elides the view in
the transient consumption; even if it doesn't, the string copy is still avoided.

Couples this branch to the SubSequence base (#10640); regionContains stays as
the allocation-free primitive the view delegates to.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Isolates the duplicate-comment guard (dbType=null skips the first-word scan;
already-DD-commented SQL makes inject return early). The extractCommentContent
substring allocated 140 B/op; the in-place range/view scan is EA-elided (~0).
@threads(8), @fork(2), -prof gc; numbers in the class Javadoc. Throughput is
flat-to-slightly-up but within @fork(2) noise -- this path is scan-CPU-dominated,
so the win is the allocation, not throughput.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dougqh dougqh added tag: performance Performance related changes tag: no release notes Changes to exclude from release notes labels Jun 24, 2026
@dougqh dougqh added inst: jdbc JDBC instrumentation comp: database Database Monitoring tag: ai generated Largely based on code generated by an AI or LLM tag: performance Performance related changes tag: no release notes Changes to exclude from release notes labels Jun 24, 2026
@dd-octo-sts

dd-octo-sts Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 14.76 s 14.63 s [-0.0%; +1.8%] (no difference)
startup:insecure-bank:tracing:Agent 13.61 s 13.69 s [-1.4%; +0.2%] (no difference)
startup:petclinic:appsec:Agent 16.95 s 16.76 s [+0.1%; +2.2%] (maybe worse)
startup:petclinic:iast:Agent 16.98 s 16.83 s [-0.0%; +1.8%] (no difference)
startup:petclinic:profiling:Agent 16.82 s 17.00 s [-2.2%; +0.1%] (no difference)
startup:petclinic:sca:Agent 16.58 s 16.56 s [-4.4%; +4.6%] (no difference)
startup:petclinic:tracing:Agent 16.02 s 16.00 s [-0.9%; +1.1%] (no difference)

Commit: bd0983fd · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

zulu-17 @fork(5), -prof gc: 23.5M -> 26.2M ops/s (~1.1x), 140 -> ~0 B/op.
Headline win is the allocation; @fork(5) tightens the earlier bimodal spread.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dougqh

dougqh commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Benchmark results (zulu-17, JDK 17.0.7, @Threads(8), @Fork(5), -prof gc)

SQLCommenterDuplicateCommentBenchmark.alreadyCommented:

throughput gc.alloc.rate.norm
before (substring) 23.5M ± 1.1M ops/s 140 B/op
after (range/view) 26.2M ± 1.5M ops/s ~0 B/op (10⁻⁵)

The extractCommentContent substring (140 B/op) is gone — the in-place range scan and the SubSequence view it flows through are both EA-elided. The headline win is the allocation: this path is dominated by the nine indexOf scans (CPU the alloc-removal doesn't touch), so throughput is a modest ~1.1× uplift, now resolved at @Fork(5). Benchmark Javadoc refreshed in bd0983f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: database Database Monitoring inst: jdbc JDBC instrumentation tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes tag: performance Performance related changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant